1 |
Shapley Idioms: Analysing BERT Sentence Embeddings for General Idiom Token Identification
|
|
|
|
In: Front Artif Intell (2022)
|
|
BASE
|
|
Show details
|
|
3 |
English WordNet Taxonomic Random Walk Pseudo-Corpora
|
|
|
|
In: Conference papers (2020)
|
|
BASE
|
|
Show details
|
|
4 |
Language related issues for machine translation between closely related south Slavic languages
|
|
|
|
BASE
|
|
Show details
|
|
5 |
Synthetic, Yet Natural: Properties of WordNet Random Walk Corpora and the impact of rare words on embedding performance
|
|
|
|
In: Conference papers (2019)
|
|
BASE
|
|
Show details
|
|
6 |
Size Matters: The Impact of Training Size in Taxonomically-Enriched Word Embeddings
|
|
|
|
In: Articles (2019)
|
|
BASE
|
|
Show details
|
|
7 |
Training corpus hr500k 1.0
|
|
|
|
Abstract:
The hr500k training corpus contains about 500,000 tokens manually annotated on the levels of tokenisation, sentence segmentation, morphosyntactic tagging, lemmatisation and named entities. About half of the corpus is also manually annotated with syntactic dependencies. Furthermore, about a fifth of the corpus is annotated with semantic role labels. The annotations (and other aspects) of the hr500k corpus are documented in the teiHeader and back element of the TEI encoded corpus. In short, they follow (1) the MULTEXT-East V5 morphosyntactic specifications for Croatian, http://nl.ijs.si/ME/V5/msd/, (2) the UDv2 Guidelines, http://universaldependencies.org/guidelines.html, and (3) the Janes annotation guidelines for named entities, http://nl.ijs.si/janes/wp-content/uploads/2017/09/SlovenianNER-eng-v1.1.pdf, while (4) the semantic role labelling annotation guidelines are currently in the publication process.
|
|
Keyword:
dependency treebank; manual annotation; named entities; parsing; part-of-speech tagging; semantic role labelling; TEI; tokenisation
|
|
URL: http://hdl.handle.net/11356/1183
|
|
BASE
|
|
Hide details
|
|
8 |
Quantitative Fine-Grained Human Evaluation of Machine Translation Systems: a Case Study on English to Croatian ...
|
|
|
|
BASE
|
|
Show details
|
|
9 |
Is it worth it? Budget-related evaluation metrics for model selection ...
|
|
|
|
BASE
|
|
Show details
|
|
10 |
Quantitative Fine-grained Human Evaluation of Machine Translation Systems: a Case Study on English to Croatian
|
|
|
|
In: Articles (2018)
|
|
BASE
|
|
Show details
|
|
11 |
Is it worth it? Budget-related evaluation metrics for model selection
|
|
|
|
In: Conference papers (2018)
|
|
BASE
|
|
Show details
|
|
12 |
hr500k – A Reference Training Corpus of Croatian.
|
|
|
|
In: Conference papers (2018)
|
|
BASE
|
|
Show details
|
|
17 |
Fine-grained human evaluation of neural versus phrase-based machine translation ...
|
|
|
|
BASE
|
|
Show details
|
|
18 |
Fine-Grained Human Evaluation of Neural Versus Phrase-Based Machine Translation
|
|
|
|
In: Prague Bulletin of Mathematical Linguistics , Vol 108, Iss 1, Pp 121-132 (2017) (2017)
|
|
BASE
|
|
Show details
|
|
|
|